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I.   Introduction 

In  the  construction  of  high  speed  computational  facilities  it  seems 
of  great  importance  to  produce  as  much  computation  speed  per  tube  used  as 
possible.  As  the  tubes  themselves  are  intrinsically  very  fast,  a  certain 
basic  speed  is  attainable  with  nothing  but  simple  tube  circuits  in  more  or 
less  conventional  forms.  This  speed  may  be  improved  upon  to  a  certain 
extent  by  making  the  circuits  somewhat  more  powerful,  by  making  some  effort 
to  minimize  circuit  capacities  and  by  using  tubes  with  higher  figures  of 
merit.  This  simple  increase  in  speed  seems  to  saturate  for  normal  direct 
coupled  circuits  at  a  point  where  basic  transfers  take  place  in  the  region 
of  50  to  100  im^sec.   It  should  be  stressed  that  even  this  speed  is 
attained  with  some  effort. 

In  some  circuits  it  is  found  to  be  possible  to  reduce  the  operation 
times  marginally  by  increasing  the  number  or  power  of  the  driving  tubes 
considerably.  This  is  not  a  linear  process  by  a  large  margin  as  a  result 
of  which,  doubling  the  number  of  tubes  changes  the  speed  by  far  less  than 
a  factor  of  two.  This  is  evidently  not  too  satisfactory  as  a  solution,  for 
in  nearly  all  cases  the  doubling  of  the  tube  count  would  enable  one  almost 
to  double  the  effective  computation  speed  were  these  tubes  used  in  two  like 
circuits.  Thus,  after  the  point  has  been  reached  where  a  given  number  of 
added  tubes  does  not  add  a  proportional  increase  in  speed,  it  would  seem 
that  parallelism  should  be  followed  where  this  leads  to  more  or  less  propor- 
tional increases  in  overall  speed  for  the  added  expenditure  of  tubes .   Of 
course,  even  this  process  will  saturate  eventually  for  a  few  problems  and 
still  later  for  all  sufficiently  complex  calculations.  The  computer  which 
has  very  high  individual  circuit  speeds  does  have  one  field  in  which  it  is 
absolutely  supreme.    This  field  includes  all  problems  where  a  sequence  of 
very  simple  operations  in  which  not  even  the  least  effort  in  the  parallel 
direction  is  possible,  must  be  performed  and  the  result  of  each  of  the 
steps  is  required  before  the  next  step  can  proceed .   This  problem  will 
then  be  soluble  just  as  rapidly  on  a  serial  machine  as  on  a  parallel  one. 
As  is  fairly  extensively  recognized,  the  class  of  such  problems  is  rather 
restricted . 
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The  use  of  increased  parallelism  may  be  shown  to  be  of  great  use 
in  the  performance  of  certain  arithmetic  and  transfer  operations . 

It  has  been  shown   that  it  is  possible  essentially  to  parallel 
the  carry  chain  so  that  the  carry  sequence  is  at  least  partially  calculated 
in  parallel.   This  method  reduces  the  carry  time  to  something  like  y-r—  th 
of  the  normal  time  for  expenditure  of  about  as  many  more  tubes  as  originally 
used  in  the  adder.  With  this  scheme  plus  a  speeding  up  of  the  circuits 
which  pass  straight  through  the  adder  it  has  been  found  to  be  feasible  to 
produce  an  adder  which  settles  down  to  the  correct  solution  in  a  time  in 
the  neighborhood  of  0.1  microsecond  after  an  input  number  is  changed. 

By  employing  a  multiplicity  of  adders,  further  effective  increases 
in  the  computation  speed  result.  By  using  n  adders  with  a  settle  down  time 
T,  microseconds  plus  a  transfer  time  of  T  microseconds,  an  m  digit  multi- 
plication may  be  completed  in  about 

T  =  T,  (-2-)  +  T0(^-)  =  -2-(T,  +  T0)  microseconds. 
lv  n  '    2^n      n  v  1    2 


Some  difficulty  will  be  found  in  decreasing  or  even  approaching  the  speeds 
achieved  by  parallelism  if  more  serial  methods  are  used,  even  with  twice 
the  expenditure  of  tubes .   It  could  be  argued  that  this  might  be  further 
increased  by  using  faster  circuitry  in  this  place,  but  this  same  additional 
expenditure  of  equipment  may  more  profitably  be  used  to  further  increase 
the  parallelism. 

II.  Discussion  of  some  parallel  circuits. 

In  order  to  produce  increased  speed  in  the  registers  by  parallelism, 
two  courses  of  action  may  be  taken.   One  of  these  is  to  allow  the  transfer 
of  information  between  more  than  one  pair  of  registers  at  a  time.   This 
will  allow  for  moving  more  than  one  number  at  a  time  when  necessary.  A 
second  method  of  procedure  utilizes  a  switching  plus  gating  process  for 
transferring  information  rather  than  a  gating  procedure  alone.   In  the  case 
of  the  production  of  shifting  facilities,  this  process  is  remarkably  use- 
ful. By  utilizing  a  multiple  depth  shift  the  time  necessary  to  perform  a 
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shift  of  n  places  "becomes  essentially  independent  of  n.  The  system  operates 
by  arranging  a  set  of  binary  switches  in  series.  The  method  is  illustrated 
in  Figure  1  for  four  digits.   Provisions  are  made  in  this  logical  arrange- 
ment for  a  cyclic  left  shift  of  up  to  three  places .  The  two  f lipf lops  on 
the  left  hold  the  binary  number  which  specifies  the  number  of  shifts  to  he 
performed . 


ourrurs   aptE:^  a 

SM  I  FT  *- 


Outputs  apt£E-  a*. 

POC.fi.  IS\I_ITY        Of-      ^M 


lops) 


Figure  1.   Multiple  Depth  Shift 

The  circuit  of  Figure  1  utilizes  a  multitude  of  binary  switches 
of  the  kind  shown  in  Figure  2 . 
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Figure  2 .  Binary  Switch  for  Multiple  Depth  Shift 


It  is  evident  that  the  output  shown  in  Figure  2  is  equal  to  input1 
B  if  F  =  0  and  it  is  equal  to  input  A  if  F  =  1.  At  each  level  the  inputs 
are  switched  so  that  they  appear  at  the  output  displaced  by  2  places  if 
the  shift  digit  in  the  2  th  place  of  the  shift  number  is  a  1  and  is  not 
displaced  at  all  if  the  shift  digit  is  a  0.  By  passing  through  a  series 
of  such  circuits  the  outputs  become  displaced  with  respect  to  the  input  by 
a  number  of  places  equal  to  the  shift  number.  The  time  of  transit  through 
the  circuit  is  practically  independent  of  the  number  of  shifts  executed 
since  the  number  of  elements  through  which  the  signals  pass  in  any  case  is 
the  same.   In  the  circuit  illustrated  in  Figure  1  the  output  signals  pass 
through  a  series  of  four  logical  elements  in  coming  from  the  input. 

The  system  shown  in  Figure  1  when  converted  to  its  circuit  equi- 
valent in  a  direct  fashion,  requires  about  three  tubes  for  each  two  input- 
one  output  switch.  Thus,  in  order  to  produce  a  maximum  possible  cyclic 
shift  of  (2   -  l)  places  in  a  shift  register  of  m  digits,  a  total  of  3mn 
double  sided  tubes  are  required.   Such  a  shifting  device  for  a  kO   digit 
shifting  register  with  a  maximum  shift  of  63  places,  as  in  the  Illiac 
would  be  quite  expensive.   Here  n  =  6  and  m  =  40  so  the  tube  count  would 
be  (3)  (6)  (ho)   =  720  tubes.  Fortunately  a  better  method  exists  for  pro- 
ducing this  two  input  switch  which  requires  but  one  tube  instead  of  three. 
This  circuit  is  identical  to  the  complement  gate  in  the  Illiac  in  which 
the  two  inputs  to  be  switched  appear  as  high  impedance  signals  at  the 
grids,  the  switching  signals  are  applied  to  the  plates,  and  the  low  impedance 
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output  appears  at  the  common  cathode  of  a  double  triode .  The  circuit  enables 
the  above  mentioned  shifting  circuit  to  be  constructed  for  mn  =  (h0)(6)   =  2^0 
tubes.  The  switch  is  shown  in  Figure  3* 


ivJpi/T  a 
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Figure  3-   Switch  Circuit 


A  further  economy  may  be  realized  if  the  cyclic  shift  is  not 

required  for  then  the  end  around  carry  portion  may  be  omitted.  The  number 

of  triode  sections  required  in  this  case  for  a  maximum  shift  of  (2   -  l) 

n-1   . 
places  in  an  m  digit  register  is  2mn  -  Z  2  .  An  evaluation  for  m  =  ^0  and 

r        .  i=0 

n  =  o  gives 


2(40)(6)  -Z21  =  480  -  63  =  1+17  triodes. 
1=0 

This  is  208.5  double  triodes,  a  saving  of  31-5  tubes  from  the  complete  case 
indicated  above. 

The  speed  of  operation  of  a  shift  using  the  above  process  is  de- 
pendent upon  the  time  required  to  set  up  the  switch  signals  and  to  allow 
the  switched  signals  to  pass  through  the  multilayer  switch.  Since  these 
all  may  be  rather  fast  with  fairly  low  impedances,  it  is  conceivable  that 
an  arbitrary  length  shift  can  be  completed  in  something  like  one  micro- 
second using  individual  circuits  of  very  little  more  power  consumption  than 
those  used  in  the  Illiac .   It  will  be  found  to  be  very  difficult  to  construct 
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a  similarly  fast  device  out  of  so  called  "high  speed"  circuits  operating 

in  a  more  serial  mode  in  which  one  shift  at  a  time  is  executed,  even  if  the 

tubes  used  in  the  high  speed  shifting  circuit  be  used  in  high  speed  circuitry. 

The  switch  shift  system  will  also  be  much  less  critical  in  its  design 

since  it  need  not  operate  so  near  to  the  limits  of  tube  (or  transistor)  speed, 

A  second  place  where  great  gains  in  speed  may  be  obtained  through 
the  selective  use  of  parallelism  is  in  the  carry  chain  in  the  adder.  Since 
the  inherent  speed  of  the  adder  is  limited  by  the  time  taken  to  complete  a 
carry,  an  improvement  in  this  time  will  directly  affect  the  time  to  do  an 
addition.  The  carry  into  any  stage  of  a  parallel  adder  is  usually  produced 
by  a  recursive  relation  involving  all  of  the  lesser  significant  stages,  so 
the  time  for  producing  the  proper  carry  into  all  stages  is  that  time  neces- 
sary to  allow  the  carry  signals  to  pass  down  the  carry  chain.   In  the  worst 
case  this  may  require  a  number  of  sequential  operations  equal  to  the  number 
of  digits  in  a  word.  Such  a  sequential  process  is  not  fundamentally  neces- 
sary but  is  used  to  decrease  the  circuit  element  count.  At  the  expense  of 
increasing  the  element  count,  the  carry  function  into  any  stage  of  the 
adder  may  be  generated  in  a  more  parallel  fashion.   Since  the  carry  into  a 
given  stage  is  present  if  a  carry  is  originated  at  some  stage  toward  the 
least  significant  end  of  the  adder  and  if  there  is  at  least  a  single  1 
at  each  intervening  stage,  a  separate  high  speed  carry  generation  circuit 
may  be  made  for  each  stage.   Unfortunately  it  becomes  impractical  to  carry 
this  process  to  completion  for  any  large  adder  because  of  the  huge  number 
of  switching  elements  required.  A  practical  compromise  does  exist  however. 
If  the  complete  adder  be  divided  into  sections  of  several  digits,  all  6T 
the  carries  within  each  of  these  smaller  sections  may  be  generated  by  the 
parallel  method.  Then  a  sequential  carry  process  may  be  used  which  pro7 
pagates  the  carries  from  section  to  section  only.   It  has  been  found  that 
a  high  speed  carry  circuit  for  10  digits  may  be  constructed  for  about  75 
tubes.  In  the  case  of  the  Illiac,  four  of  these  would  be  required.  The 
maximum  length  of  carry  chain  then  would  be  through  a  total  of  eight 
sequential  logical  elements,  as  compared  to  eighty  in  the  present  system. 
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As  the  sequence  is  shorter,  all,  or  nearly  all,  amplifiers  may  be  eliminated 
to  yield  a  really  high  speed  carry  which  is  not  slowed  down  by  amplifier 
stages.  Estimates  of  speed  indicates  that  with  fairly  standard  circuits, 
the  carry  might  be  completed  in  0.1  microsecond.  By  redesigning  the  adder 
proper  the  propagation  time  straight  through  the  adder  may  be  made  much 
smaller  than  in  the  Illiac  so  that  an  adder  of  40  digits  could  complete 
an  addition  in  about  0.1  microseconds,  the  propagation  through  the  adder 
yielding  a  negligible  increment  of  time  increase  over  the  bare  carry  time. 

In  the  process  of  performing  a  multiplication  some  considerable  ad- 
vantage may  be  taken  of  parallel  adders  to  further  increase  the  speed  of 
this  process.   If  it  be  desired  to  multiply  by  n  digits  of  the  multiplier 
at  a  time,  n  adders  may  be  used.   They  are  arranged  as  shown  in  Figure  ^t- 
for  a  four  adder  array. 
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Figure  U«  Multiplication  Using  Four  Adders 
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The  time  taken  to  do  a  k   by  kO   bit  multiplication  in  this  way  is 
approximately  0.11  microsecond  if  addition  takes  0.1  sec.  Thus  forty 
digit  multiplication  may  he  carried  out  in  (10)(U  bit  multiplication  time) 
+  (10 ) (shift  time).  This  means  that  with  a  0.1  microsecond  shift  time  the 
multiplication  time  will  be  about  1.1  +  1  =  2.1  microseconds.  This  is  in 
the  desired  speed  range.   It  may  be  shown  that  this  four  adder  circuit  will 
cost  in  the  neighborhood  of  2500  double  triodes .   It  is  rather  doubtful  that 
the  given  speed  could  be  approached  in  even  an  approximate  manner  with  more 
or  less  serial  modes  of  very  high  speed  circuitry,  even  with  an  infinite 
expenditure  of  tubes .   It  may  be  that  the  use  of  2500  tubes  is  not  practical, 
but  this  system  indicates  that  higher  speeds  are  more  readily  achievable 
by  the  use  of  increased  parallelism  than  with  brute  force  speed  circuitry 
with  the  normal  amount  of  parallelism. 

Ill .   Summary 

Because  of  the  above  considerations  it  is  felt  that  it  is  more  pro- 
fitable to  expend  tubes  (or  transistors )  to   increase  the  parallelism  of  a 
computer  rather  than  to  use  them. for  increasing  the  individual  circuit  speeds 
beyond  a  certain  point.  This  conclusion  is  based  on  the  following  consid- 
erations . 

1.  When  the  active  elements  are  not  being  pushed  for  all  possible 
speed,  it  is  possible  to  provide  more  circuit  reliability  since  not  so 
much  effort  needs  to  be  expended  in  cutting  tolerance  corners  to  get  speed. 

2.  After  a  certain  limiting  speed  is  reached  the  additional 
expenditure  in  elements  rises  much  faster  than  the  increase  in  speed.  This 
point  of  diminishing  returns  seems  to  occur,  for  vacuum  tube  circuits,  when 
the  gating  time  has  been  reduced  to  the  50  to  100  milli-microsecond  region. 

3-  When  high  circuit  speeds  are  used,  problems  of  circuit  stability 
and  propagation  time  within  the  machine  become  more  unmanageable,  often 
causing  it  to  be  necessary  to  employ  excessively  complex  and  expensive 
construction  techniques  to  secure  any  degree  of  success. 
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As  a  result  of  this  situation,  it  is  proposed  that  circuit  speeds 
be  increased  to  the  point  where  further  increases  sacrifice  too  much  in 
tolerance  and  components  to  he  profitable.  Beyond  this  point  some  of  the 
many  methods  of  parallelism  may  he  employed  to  gain  further  effective  com- 
putation speed. 
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